ty is maximised. Suppose ܆
܆∪܆, where A and B stand for
es, for instance, a class of non-cleaved peptides and a class of
peptides. Thus a space can be decomposed using the following
܆௧ܟൌ܆
௧∪܆
௧ൌܡො∪ܡො
(3.4)
ܡො and ܡො will form two densities. It is expected that a mixture
f ܡො and ܡො should be bimodal meaning that the density of ܡො
ensity of ܡො are well separated from each other so as to be able
minate between two classes of data points, i.e., the peptides in the
or protease cleavage pattern discovery. This requires the
ty of the mixture density of ܡො and ܡො to be maximised. In other
discriminant analysis needs to maximise the distance between
ities of ܡො and ܡො and to minimise the overlap between two
of ܡො and ܡො to generate an optimal LDA model. Unless these
itions have been well-satisfied, a LDA model will not work well.
ear discriminant analysis model is in the format shown in the
(3.2). The values of ݕො are called the projected data or the
ns onto the projection direction, which are continuous values. A
del is a parametric model because the projection direction is
rised by ݓଵ, ݓଶ, ⋯ and ݓௗ.
e projection direction optimisation
the values of ݕො are determined by w, the density of ݕො will vary
s. Figure 3.1 shows a classification problem, where the classifier
d by ݕො ൌݓݓଵݔଵݓଶݔଶ. There are two scenarios (hence
els) of this classifier in this case, where the model parameters w
ferent values in two panels, hence two different models. The inset
show two different density patterns of the projections ݕො. The
ogram density of ݕො shown in Figure 3.1(a) is bimodal. However,
histogram density of ݕො shown in Figure 3.1(b) is unimodal. If two
wo models) of Figure 3.1 are compared, it can be seen that the key
bimodal distribution of projections is to optimise the projection